30 research outputs found
Reliability in the face of variability in nanometer embedded memories
In this thesis, we have investigated the impact of parametric variations on the behaviour of one performance-critical processor structure - embedded memories. As variations manifest as a spread in power and performance, as a first step, we propose a novel modeling methodology that helps evaluate the impact of circuit-level optimizations on architecture-level design choices. Choices made at the design-stage ensure conflicting requirements from higher-levels are decoupled. We then complement such design-time optimizations with a runtime mechanism that takes advantage of adaptive body-biasing to lower power whilst improving performance in the presence of variability. Our proposal uses a novel fully-digital variation tracking hardware using embedded DRAM (eDRAM) cells to monitor run-time changes in cache latency and leakage. A special fine-grain body-bias generator uses the measurements to generate an optimal body-bias that is needed to meet the required yield targets. A novel variation-tolerant and soft-error hardened eDRAM cell is also proposed as an alternate candidate for replacing existing SRAM-based designs in latency critical memory structures. In the ultra low-power domain where reliable operation is limited by the minimum voltage of operation (Vddmin), we analyse the impact of failures on cache functional margin and functional yield. Towards this end, we have developed a fully automated tool (INFORMER) capable of estimating memory-wide metrics such as power, performance and yield accurately and rapidly. Using the developed tool, we then evaluate the #effectiveness of a new class of hybrid techniques in improving cache yield through failure prevention and correction. Having a holistic perspective of memory-wide metrics helps us arrive at design-choices optimized simultaneously for multiple metrics needed for maintaining lifetime requirements
vPROBE: Variation aware post-silicon power/performance binning using embedded 3T1D cells
In this paper, we present an on-die post-silicon binning methodology that takes into account the effect of static and dynamic variations and categorizes every processor based on power/performance.The proposed scheme is composed of a discretization hardware that exploits the delay/leakage dependence on variability sources characteristic for categorizationPreprin
Dynamic fine-grain body biasing of caches with latency and leakage 3T1D-based monitors
In this paper, we propose a dynamically tunable fine-grain body biasing mechanism to reduce active & standby leakage power in caches under process variations.Preprin
Energy vs. Reliability Trade-offs Exploration in Biomedical Ultra-Low Power Devices
State-of-the-art wearable devices such as embedded biomedical monitoring systems apply voltage scaling to lower as much as possible their energy consumption and achieve longer battery lifetimes. While embedded memories often rely on Error Correction Codes (ECC) for error protection, in this paper we explore how the characteristics of biomedical applications can be exploited to develop new techniques with lower power overhead. We then introduce the Dynamic eRror compEnsation And Masking (DREAM) technique, that provides partial memory protection with less area and power overheads than ECC. Different tradeoffs between the error correction ability of the techniques and their energy consumption are examined to conclude that, when properly applied, DREAM consumes 21% less energy than a traditional ECC with Single Error Correction and Double Error Detection (SEC/DED) capabilities
Mitigating the Impact of Faults in Unreliable Memories for Error-Resilient Applications
Inherently error-resilient applications in areas such as signal processing, machine learning and data analytics provide opportunities for relaxing reliability requirements, and thereby reducing the overheads incurred by conventional error correction schemes. In this paper, we exploit the tolerable imprecision of such applications by designing an energy-efficient fault-mitigation scheme for unreliable memories to meet target yield. The proposed approach uses a bit-shuffling mechanism to isolate faults into bit locations with lower significance. By doing so, the bit-error distribution is skewed towards the low order bits, substantially limiting the output error magnitude. By controlling the granularity of the shuffling, the proposed technique enables trading-off quality for power, area and timing overhead. Compared to error-correction codes, this can reduce the overhead by as much as 83% in power, 89% in area, and 77% in access time when applied to various data mining applications in 28nm process technology
Approximate Computing with Unreliable Dynamic Memories
Embedded memories account for a large fraction of the overall silicon area and power consumption in modern SoC(s). While embedded memories are typically realized with SRAM, alternative solutions, such as embedded dynamic memories (eDRAM), can provide higher density and/or reduced power consumption. One major challenge that impedes the widespread adoption of eDRAM is that they require frequent refreshes potentially reducing the availability of the memory in periods of high activity and also consuming significant amount of power due to such frequent refreshes. Reducing the refresh rate while on one hand can reduce the power overhead, if not performed in a timely manner, can cause some cells to lose their content potentially resulting in memory errors. In this paper, we consider extending the refresh period of gain-cell based dynamic memories beyond the worst-case point of failure, assuming that the resulting errors can be tolerated when the use-cases are in the domain of inherently error-resilient applications. For example, we observe that for various data mining applications, a large number of memory failures can be accepted with tolerable imprecision in output quality. In particular, our results indicate that by allowing as many as 177 errors in a 16kB memory, the maximum loss in output quality is 11%. We use this failure limit to study the impact of relaxing reliability constraints on memory availability and retention power for different technologies
Assessing the Effects of Low Voltage in Branch Prediction Units
Branch prediction units are key performance components in modern
microprocessors as they are widely used to address control hazards and
minimize misprediction stalls. The continuous urge of high performance
has led designers to integrate highly sophisticated predictors with
complex prediction algorithms and large storage requirements. As a
result, BPUs in modern microprocessors consume large amounts of power.
But when a system is under a limited power budget, critical decisions
are required in order to achieve an equilibrium point between the BPU
and the rest of the microprocessor.
In this work, we present a comprehensive analysis of the effects of low
voltage configuration Branch Prediction Units (BPU). We propose a design
with separate voltage domain for the BPU, which exploits the speculative
nature of the BPU (which is self-correcting) that allows reduction of
power without affecting functional correctness. Our study explores how
several branch predictor implementations behave when aggressively
undervolted, the performance impact of BTB as well as in which cases it
is more efficient to reduce the BP and BTB size instead of undervolting.
We also show that protection of BPU SRAM arrays has limited potential to
further increase the energy savings, showcasing a realistic protection
implementation. Our results show that BPU undervolting can result in
power savings up to 69%, while the microprocessor energy savings can be
up to 12%, before the penalty of the performance degradation overcomes
the benefits of low voltage. Neither smaller predictor sizes nor
protection mechanisms can further improve energy consumption
Analysis and Characterization of Ultra Low Power Branch Predictors
Branch predictors are widely used to boost the performance of
microprocessors. However, this comes at the expense of power because
accurate branch prediction requires simultaneous access to several large
tables on every fetch. Consumed power can be drastically reduced by
operating the predictor under sub-nomimal voltage levels (undervolting)
using a separate voltage domain. Faulty behavior resulting from
undervolting the predictor arrays impacts performance due to additional
mispredictions but does not compromise system reliability or functional
correctness. In this work, we explore how two well established branch
predictors (Tournament and L-Tage) behave when aggressively undervolted
below minimum fault-free supply voltage (V-min). Our results based on
fault injection and performance simulations show that both predictors
significantly reduce their power consumption by more than 63% and can
deliver a peak 6.4% energy savings in the overall system, without
observable performance degradation However, energy consumption can
increase for both predictors due to extra mispredictions, if
undervolting becomes too aggressive
On the effectiveness of hybrid mechanisms on reduction of parametric failures in caches
In this paper, we provide an insight on the different proactive read/write assist methods (wordline boosting & adaptive body biasing) that help in preventing (and reducing) parametric failures when coupled with reactive techniques like ECC and redundancy which cope with already existent failures. While proactive and reactive have been previously viewed as complementary techniques, we show that it is not necessarily the case when considering the benefits of such hybrid schemes
vPROBE: Variation aware post-silicon power/performance binning using embedded 3T1D cells
In this paper, we present an on-die post-silicon binning methodology that takes into account the effect of static and dynamic variations and categorizes every processor based on power/performance.The proposed scheme is composed of a discretization hardware that exploits the delay/leakage dependence on variability sources characteristic for categorizatio